This document details the analyses of the free classification task given to a total of 194 participants divided into 5 groups. The groups included a total English monolinguals (n = 62), East Asian multilingual speakers (n = 26), Southeast Asian multilingual speakers (n = 28), South Asian speakers (n = 24), and multilingual speakers of other languages (n = 59).
Several statistical analyses were carried out to determine whether language background has an impact on the processing of accented English. First, descriptive statistics are reported which show the total number of categories created by each of the 5 groups given the same 45 speakers. Next, to determine how appropriate or correct these categories were, 3 error rates were calculated for each group: 2 category creation, 5 category creation and 15 category creation. 2 category creation error rate measured how often participants inappropriately categorized an Asian language category with an English language category and vice versa. 5 category error rate measured how often participants inappropriately categorized any of the 5 categories (English, International English, East Asian, South Asian or Southeast Asian) with another category. Finally, 15 category error rate measured how often the participants inappropriately categorized any of the 15 language categories with a different language. Single categories also counted as an error, since the minimum category number would be 3 (of the same language). Thus, the maximum number of errors was 45, and would occur if a participant created 45 single categories. The error rate in each case was the total number of errors divided by the maximum number of errors (45). The overall category was whichever of the categories occurred the most frequently within a given category. In the event that there were an equal number of categories to determine overall category, this did not impact the total number of errors and the overall category label was arbitrary. For example, if 4 languages were groups in a category, and 2 of them were American English and the other 2 were East Asian, this would be counted as two errors regardless of the overall category label.
In addition to the total number of categories, the closeness of those
categories could provide evidence of between-group differences. This
analysis make use of multidimensional scaling analysis (MDS) in which a
(dis)similarity matrix allows for the visualization of categorization
differences between groups to be examined in a two-dimensional space.
This analysis has been used in previous papers, rather than Hierarchical
clustering (HC) analysis, which assumes that the horizontal category
differences/distances are equal. In other words, MDS allows for a more
fine-grained analysis of the differences than HC, since it measures
differences in two dimensions, rather than one (height). The MDS was
used to assign an x and y coordinate to each speaker (n = 45) by each
group (n = 5). An ellipse around the points for each was created using
the stat_ellipse function in ggplot2 (cite) for the purpose
of visualization based on the method put forward by Fox and Weisberg
(2011). A centroid was also calculated for each categorization for each
group by averaging the x and y coordinates. Distance between centroids
by a particular group was measured to provide evidence of how distinct
groups distinguished categories. As a measure of within-category
tightness, the euclidian distance of each point was calculated from each
individual point to its centroid. The figure below shows the MDS for
each group. The color of points were assigned based on the actual
category of the language, which was unknown to the participant. Here, a
smaller ellipse/points being closer together suggests that a particular
language group is being categorized as more similarly.
Finally, a Bayesian multilevel regression model was run to determine whether there were differences between groups and conditions in terms of how concentrated a given category was. The outcome variable of the model was the euclidian distance of a point from its centroid, where a smaller euclidian distance would suggest an overall tighter/more consistent category. The fixed effect predictors included group (5 levels, English monolingual, East Asian, South Asian, Southeast Asian, and Non-asian Multilingual) and language group (5 levels, American English, International English, East Asian, Southeast Asian, East Asian), with a random intercept for individual language. The model included the default brms priors - Student’s T distribution with 3 degrees of freedom. The model was run using with 4000 iterations of Hamiltonian Monte-Carlo sampling (1000 warm up), across 4 chains and 8 processing cores.
Total categories were calculated by the unique groups made by each participant. Mean total categories were calculated for each group. The figure below shows these averages, with the standard deviation in parentheses.
This boxplot shows the the number of members in the category of each participant that contained the most members, where the mean and standard deviation of the number of members in the largest group per group is included on the right side of the plot. Interestingly, the English monolinguals had the least number of speakers on average in their largest group, while also creating the most groups on average.
The figure below shows the error rates by each group in the 2, 5 and 15 error categories. The tables also show 2, 5 and 15 error rates in writing and correspond to the same values in the figure.
Error rates by each group
| Group | Error rate (sd) |
|---|---|
| East Asian | 0.031 (0.035) |
| English monolingual | 0.023 (0.043) |
| Non-Asian multilingual | 0.052 (0.054) |
| South Asian | 0.016 (0.02) |
| South-east Asian | 0.034 (0.04) |
| Group | Error rate (sd) |
|---|---|
| East Asian | 0.213 (0.077) |
| English monolingual | 0.189 (0.091) |
| Non-Asian multilingual | 0.303 (0.127) |
| South Asian | 0.171 (0.062) |
| South-east Asian | 0.23 (0.136) |
| Group | Error rate (sd) |
|---|---|
| East Asian | 0.551 (0.1) |
| English monolingual | 0.52 (0.092) |
| Non-Asian multilingual | 0.624 (0.101) |
| South Asian | 0.563 (0.123) |
| South-east Asian | 0.566 (0.114) |
The results of the model are shown below, where the
conditional_effects function was used to plot the estimate
of each group for each language. Additional model details, including a
forest plot, model table, and detailed model significance table are
included in a section here called “appendix”.
The following items show the full Bayesian model in a forest plot, a table showing the parameter estimates and a more detailed table including information related to the probability of an effect being positive/negative (probability of direction; pd), the probability of a “significant” (non-zero) effect (ps), and the Highest Density interval (including the mean, median and upper and lower bounds of 95% of the most probable parameter estimates).
| dist from center | ||
|---|---|---|
| Predictors | Estimates | CI (95%) |
| Intercept | 1.53 | 0.18 – 2.85 |
| groupEnglishmono | 0.18 | -1.57 – 1.92 |
| groupNonmulti | 2.30 | 0.51 – 4.08 |
| groupSouthAsian | 0.03 | -1.70 – 1.83 |
| groupSoutheastAsian | 0.07 | -1.73 – 1.86 |
| lang_2East | 2.43 | 0.56 – 4.26 |
| lang_2International | -0.75 | -2.63 – 1.10 |
| lang_2South | 2.78 | 0.83 – 4.64 |
| lang_2SouthEast | 3.28 | 1.42 – 5.17 |
| Random Effects | ||
| σ2 | 3.51 | |
| τ00 lang_3 | 0.09 | |
| ICC | 0.02 | |
| N lang_3 | 15 | |
| Observations | 225 | |
| Marginal R2 / Conditional R2 | 0.442 / 0.449 | |
| Parameter | Component | Median | Mean | MAP | CI | CI_low | CI_high | pd | ps | Rhat | ESS |
|---|---|---|---|---|---|---|---|---|---|---|---|
| b_Intercept | conditional | 1.531 | 1.516 | 1.664 | 0.95 | 0.214 | 2.876 | 0.988 | 0.971 | 1.006 | 991.531 |
| b_groupEnglishmono | conditional | 0.177 | 0.178 | 0.226 | 0.95 | -1.500 | 1.968 | 0.581 | 0.474 | 1.008 | 1141.779 |
| b_groupNonmulti | conditional | 2.302 | 2.293 | 2.392 | 0.95 | 0.544 | 4.104 | 0.992 | 0.987 | 1.005 | 1069.277 |
| b_groupSouthAsian | conditional | 0.028 | 0.050 | -0.075 | 0.95 | -1.594 | 1.894 | 0.516 | 0.414 | 1.005 | 1259.040 |
| b_groupSoutheastAsian | conditional | 0.067 | 0.047 | 0.068 | 0.95 | -1.627 | 1.944 | 0.523 | 0.425 | 1.005 | 1372.438 |
| b_lang_2East | conditional | 2.432 | 2.433 | 2.422 | 0.95 | 0.559 | 4.255 | 0.992 | 0.988 | 1.006 | 1231.306 |
| b_lang_2International | conditional | -0.755 | -0.767 | -0.690 | 0.95 | -2.680 | 1.036 | 0.793 | 0.713 | 1.004 | 1292.729 |
| b_lang_2South | conditional | 2.780 | 2.767 | 2.837 | 0.95 | 0.805 | 4.608 | 0.998 | 0.997 | 1.004 | 1306.428 |
| b_lang_2SouthEast | conditional | 3.280 | 3.283 | 3.423 | 0.95 | 1.550 | 5.264 | 1.000 | 0.999 | 1.006 | 1193.188 |
| b_groupEnglishmono:lang_2East | conditional | -0.605 | -0.595 | -0.756 | 0.95 | -3.027 | 1.843 | 0.679 | 0.609 | 1.005 | 1437.754 |
| b_groupNonmulti:lang_2East | conditional | -2.178 | -2.154 | -2.253 | 0.95 | -4.522 | 0.520 | 0.946 | 0.926 | 1.004 | 1363.266 |
| b_groupSouthAsian:lang_2East | conditional | -1.395 | -1.419 | -1.232 | 0.95 | -3.798 | 1.091 | 0.871 | 0.832 | 1.004 | 1569.788 |
| b_groupSoutheastAsian:lang_2East | conditional | 0.155 | 0.170 | 0.016 | 0.95 | -2.370 | 2.675 | 0.544 | 0.474 | 1.006 | 1646.251 |
| b_groupEnglishmono:lang_2International | conditional | 0.567 | 0.573 | 0.558 | 0.95 | -1.780 | 3.032 | 0.678 | 0.606 | 1.004 | 1549.684 |
| b_groupNonmulti:lang_2International | conditional | 0.354 | 0.346 | 0.587 | 0.95 | -2.128 | 2.798 | 0.609 | 0.536 | 1.003 | 1466.576 |
| b_groupSouthAsian:lang_2International | conditional | 2.643 | 2.636 | 2.582 | 0.95 | 0.059 | 4.941 | 0.982 | 0.971 | 1.003 | 1657.478 |
| b_groupSoutheastAsian:lang_2International | conditional | 3.485 | 3.504 | 3.517 | 0.95 | 1.001 | 5.976 | 0.997 | 0.994 | 1.002 | 1746.711 |
| b_groupEnglishmono:lang_2South | conditional | -1.718 | -1.719 | -1.564 | 0.95 | -4.205 | 0.848 | 0.911 | 0.879 | 1.004 | 1547.022 |
| b_groupNonmulti:lang_2South | conditional | -1.832 | -1.822 | -1.926 | 0.95 | -4.313 | 0.635 | 0.922 | 0.889 | 1.004 | 1516.314 |
| b_groupSouthAsian:lang_2South | conditional | -2.953 | -2.942 | -3.100 | 0.95 | -5.514 | -0.439 | 0.987 | 0.979 | 1.003 | 1758.951 |
| b_groupSoutheastAsian:lang_2South | conditional | 0.566 | 0.573 | 0.453 | 0.95 | -1.883 | 3.039 | 0.669 | 0.594 | 1.003 | 1676.928 |
| b_groupEnglishmono:lang_2SouthEast | conditional | 2.922 | 2.901 | 3.021 | 0.95 | 0.296 | 5.269 | 0.989 | 0.980 | 1.007 | 1384.854 |
| b_groupNonmulti:lang_2SouthEast | conditional | -2.969 | -2.967 | -2.916 | 0.95 | -5.481 | -0.545 | 0.992 | 0.985 | 1.007 | 1264.185 |
| b_groupSouthAsian:lang_2SouthEast | conditional | -0.548 | -0.540 | -0.502 | 0.95 | -3.062 | 1.900 | 0.666 | 0.596 | 1.004 | 1452.721 |
| b_groupSoutheastAsian:lang_2SouthEast | conditional | -1.407 | -1.401 | -1.474 | 0.95 | -3.919 | 1.097 | 0.865 | 0.819 | 1.005 | 1640.445 |
| sigma | sigma | 1.868 | 1.873 | 1.857 | 0.95 | 1.701 | 2.062 | 1.000 | 1.000 | 1.000 | 3495.151 |